Physics-embedded inverse analysis with algorithmic differentiation for the earth’s subsurface

Inverse analysis has been utilized to understand unknown underground geological properties by matching the observational data with simulators. To overcome the underconstrained nature of inverse problems and achieve good performance, an approach is presented with embedded physics and a technique known as algorithmic differentiation. We use a physics-embedded generative model, which takes statistically simple parameters as input and outputs subsurface properties (e.g., permeability or P-wave velocity), that embeds physical knowledge of the subsurface properties into inverse analysis and improves its performance. We tested the application of this approach on four geologic problems: two heterogeneous hydraulic conductivity fields, a hydraulic fracture network, and a seismic inversion for P-wave velocity. This physics-embedded inverse analysis approach consistently characterizes these geological problems accurately. Furthermore, the excellent performance in matching the observational data demonstrates the reliability of the proposed method. Moreover, the application of algorithmic differentiation makes this an easy and fast approach to inverse analysis when dealing with complicated geological structures.

Inverse analysis gives the solution of inverse problems aiming to find unknown properties of an object, or a medium, from observing a response of this object or medium 1 . The inverse analysis process describes finding the matched predictions through a forward model calculation, which takes the parameters describing unknown properties as input, to the observational data 2,3 . A representative example is seismic inversion, which often involves triggering a source wavefield at the earth's surface and collecting the scattered data at receivers from various positions along the surface. Accounting for the received data, it is possible to find the heterogeneous subsurface structures, such as the existence of an oil deposit, a cave, or a mine 1 . In earth science, geological reservoir characterization is of essential value for maximizing oil production from mature hydrocarbon provinces, detecting fluid distributions (groundwater, oil, gas, etc.) 4,5 , and many other important issues affecting our daily lives [6][7][8] . In addition, reservoir properties show spatial heterogeneities from pore to reservoir scale, and it is critical to properly resolve the heterogeneity effects on the underground fluid flow system 9,10 . However, since physical properties can not be observed directly in the field, inverse analysis techniques have to be applied in order to understand the heterogeneous reservoir properties depending on observational data, such as pressure for hydraulic conductivity fields.
In this study, we propose a novel method for inverse analysis, which generalizes different inverse analysis approaches and can include embedded physics understanding. In addition, we test this novel inverse analysis method for different earth science problems: heterogeneous hydraulic conductivity of a groundwater flow system, hydraulic fracture distribution in a gas-producing reservoir, and seismic inverse for subsurface aquifer determination. Porous media has been the source of valuable fluids such as groundwater and petroleum, as well as both liquid and natural gas 11 . In a groundwater flow system, fully understanding the heterogeneous subsurface hydraulic conductivity distribution is of importance for estimation of drinking groundwater utilization and contamination mitigation [6][7][8] . In addition, the earth's subsurface has also been used for the injection of slurried wastes, like hazardous chemicals or radioactive byproducts 12,13 , and certain geological reservoirs have been used for CO 2 storage and recovery 10,[14][15][16][17] . Characterizing the underground structures, which enables the prediction of the fluid flow system behavior, is essential for successfully using geological sources and avoiding environmental contamination for the projects mentioned above. Notably, we focus on two different scales of heterogeneity in this study. Furthermore, among the producing wells drilled in North America since the 1950s, around 70% of gas wells and 50% of oil wells have been hydraulically fractured. Once a hydraulic fracture is generated, fluid in the reservoir will flow out or into the fracture face and then, along the fracture path, flow out www.nature.com/scientificreports/ observational data has been utilized during calibration to reach good performance of the inverse model, regularization, a numerical technique involving adding a term to the objective function, is highly valued for improving results. Adding a regularization term to the objective function seeks to develop additional desired features to the inverse solution, such as smoothness, convexity, or respecting prior knowledge of geologic features. In addition, optimization is the most time-consuming step during inverse analysis, but we apply algorithmic differentiation to increase the computational efficiency 52 . Algorithmic differentiation can compute gradients with a low computational cost for complicated computer programs by applying the chain rule repeatedly. Specifically, reverse model algorithmic differentiation is good at calculating high-dimensional derivatives (in this study, we map a high-dimensional input to a low-dimensional output), which is often useful for inverse analysis problems with substantial computational savings. Because of regularization and algorithmic differentiation techniques, inverse analysis becomes more feasible to estimate interest quantities reasonably based on available data 32 . The remainder of this manuscript describes the workflow of the general physics-embedded inverse analysis, the inverse analysis results for different problems, and the benefits and improvement of this approach in "Methods", "Examples and results", and "Discussion" sections. Finally, we present our conclusion about applying this approach in "Conclusion" section.

Methods
The physics-embedded inverse analysis starts with the physics-embedded generative model generation. Specifically, in this study, the physics-embedded generative models describing the quantities of interest are the heterogeneous hydraulic conductivity distribution, hydraulic fracture distribution, and seismic P-wave velocity. These are the targets of what inverse analysis is trying to predict through observational data matching. Several key factors are picked to represent the variability of the targets through the physics-embedded generative model for the stability test. For example, in the hydraulic fracture problem, five key factors (which can be understood as latent variables) are utilized to represent the lengths of the hydraulic fractures in a cluster, which is correlated with permeability. The physics-embedded generative model describes the relationship between the latent variables to the target properties we are interested in. In addition, the physics-embedded generative model embeds physical knowledge of the system. Continuing with the hydraulic fracture example, once the lengths of the hydraulic fractures are figured out, the permeability of the fractures could be calculated based on the mathematical models, e.g., we apply the fracture size-transmissivity relationship. Finally, we build a model that characterizes the hydraulic fracture permeability distribution based on the representative latent variables. Broadly speaking, the physics-embedded generative model links the small number of latent variables to many target properties of interest, encoding the relationship between them. The physical knowledge embedded in the generative model increases the reliability and accuracy of the inverse analysis.
Once the physics-embedded generative model has been constructed, the second step is the objective function set up, where a forward physical model takes the output from the physics-embedded generative model to simulate the mechanisms of the study system, like fluid flow for hydraulic conductivity fields. Using the hydraulic fracture problem as an example again, a gas production situation has been simulated. The fluid flow from fractures to the pumping well, specifically the pressure drop, is calculated through the forward model. The objective function characterizes the difference between the observational data and the predicted output from the forward model. The final step is performing the inverse analysis using gradient-based optimization with the gradients being computed by algorithmic differentiation. During this step, the output from the forward model is compared with the observational data through the loss function to achieve final optimized results. The detailed workflow is illustrated in Fig. 1.
During this study, the notation p and p are used to represent a physical reservoir property field in vector form, z and ẑ represent the latent variables, and h and ĥ represent a vector of observations for inverse analysis and the calculation from the forward physical model, respectively. This study obtained observations h from different problems directly from the related reference fields p through the forward model. The forward physical model predictions ĥ are obtained based on the guess of target properties p through the iterations. The error used to measure the difference between the true and predicted values should be differentiable for the inverse analysis. This loss takes the simplified form of the sum of squared residuals in the examples studied here.
The optimization problem of inverse analysis is formulated in terms of the key factors or latent variables, ẑ and includes regularization in the objective function. The optimization problem of inverse analysis is formulated in terms of the latent variables, z: where the regularization term varies depending on application, and we call f (z) the objective function. The specific objective function is discussed in the subsections of Examples and Results. During the optimization process, through gradient calculation, seeking the minimum value of the objective function represents the finding of the optimized final results. Regularization adds additional benefits to the inverse analysis, avoiding side effects like overfitting. In addition, algorithmic differentiation is used to compute the gradients for the optimization. Our method not only includes the physics understanding in the inverse analysis, but also finds the potential to conduct optimization calculations for complicated geological problems easily and efficiently, which is algorithmic differentiation. Specifically, the algorithmic differentiation library we applied is Zygote.jl 52 , and we use the differentiable physics simulator, DPFEHM 53 . As for optimizing the objective function, a gradient-based optimization method is utilized, which is the limited-memory Broyden-Fletcher-Goldfarb-Shanno 54 (L-BFGS) method  55 . The Optim.jl 56 software package is specified for this process. Of course, other gradient-based optimization routines could also be used. For all the problems, to start the inverse analysis, the initial guess of key factors is set to be 0.

Examples and results
This study provides a generative approach to physics-embedded inverse analysis. We focus on three problems: heterogeneous hydraulic conductivity field, hydraulic fracture distribution, and seismic inversion of P-wave velocity property. Two types of heterogeneity have been considered for heterogeneous hydraulic conductivity fields. For larger-scale heterogeneous fields, to improve the inverse analysis performance, the ML method was applied. To estimate the performance of inverse analysis, the comparison of reference fields and the final inverse results are conducted for different problems and are presented in Figs. 2, 4, 5 and 6. The convergence of different inverse analyses showing the optimization process is described in the supplementary information, Figs. S1-S4. The comparison of observational data and the prediction of the forward physical model after inverse analysis are shown in the supplementary information, Figs. S5-S7. In Figs. 4, 5 and 6, the comparison results, the first rows are the three "true" reference fields. The following rows demonstrate the inverse results, while the last rows are the difference calculated between the "true" and results estimated by the inverse analysis. On top of the inverse results (second row) the relative error is displayed, which measures how close the inverse result is to the reference field and is defined as where p is the mean of the reference field. Especially for the Gaussian hydraulic conductivity field in Fig. 2, only one reference field (as the Gaussian field is easier to be characterized than the bimodal field, which demonstrates three examples in Fig. 4) has been represented to investigate the performance of the proposed approach.
Principal component geostatistical approach for Gaussian hydraulic conductivity. One of this work's focuses is a heterogeneous hydraulic conductivity field. First, we discuss a multivariate Gaussian field of small-scale heterogeneity in the hydraulic conductivity field. A 200 m × 200 m subsurface aquifer is simulated with a unit thickness. Two hundred eigenvalues z , as latent variables (the principal components), have been introduced to go through a Gaussian distribution, with mean 0, variance 1, and correlation length 50 m, to create the heterogeneity of the research area. We use the GaussianRandomFields.jl package to generate the multivariate Gaussian field for the Julia programming language 57 . The heterogeneous hydraulic conductivity field p (200 × 200) is shown in Fig. 2a. For the physics-embedded generative model, we utilize the principle components of the covariance matrix to represent the Gaussian distribution. The principle components of the covariance matrix are www.nature.com/scientificreports/ calculated by Karhunen-Loève theorem. This embeds knowledge of the statistical structure of the permeability fields. Our approach generalizes existing PCGA and accelerates it through algorithmic differentiation (PCGA was designed for black-box models where algorithmic differentiation is not possible). We intend to establish that our approach can replicate a method familiar to geoscientists (especially hydrologists) and provide better performance. More importantly, the background groundwater flow is simulated through a forward physical model for inverse analysis calibration. The boundary condition yields a constant 5 m head drop from left to right. In addition, in the center of the research area, water is injected at a rate of 1.0 m 3 /s. The observation used to inform the inverse analysis is the hydraulic head, from a static forward Darcy's law and considering mass conservation, on a 16 × 16 regular grid within the domain. Figure 3a shows the reference head distribution, and the positions for all the observations are shown in Fig. 3d. The objective function is specified as the following equation, which considers latent variables as well: For the Gaussian hydraulic conductivity field in Fig. 2, the "true" reference field and the estimated result are similar, particularly for the center area, which indicates the good performance of the inverse analysis. However, the error reaches around one order of magnitude, only existing at the top and bottom edges. We hypothesize www.nature.com/scientificreports/ this is partly due to the fixed pressure boundary conditions, which make the observations less sensitive to the hydraulic conductivity near these boundaries.
ML approach for bimodal hydraulic conductivity. Beyond the Gaussian field, to show our approach generalizes additional methods, we show how it generalizes RegAE 48 . RegAE is a method that can solve more challenging permeability fields than the principal component geostatistical approach. The domain is a 100 m × 100 m subsurface aquifer with a unit thickness. In this type of field, the higher heterogeneity is applied and is represented by two hydrogeologic facies with distinct properties, each of which is a multivariate Gaussian distribution. The two multivariate Gaussian structures are shown as conductivity 1 and 2 in Table 1. More importantly, the "Split" model has a different multivariate Gaussian structure that has been utilized to indicate which of the facies is present at a given location. As a result, the new type of field shows a bimodal hydraulic conductivity distribution, and the reference fields are represented in Fig. 4a, d, and g. In terms of the higher heterogeneity of the bimodal fields, a generative machine learning model VAE is included to capture the hydrogeological properties distribution in this study. VAE 58 is a generative ML model with neural network architecture and has widespread application for image data 59 . VAE consists of two parts: an encoder and a decoder. The encoder step maps a high-dimensional space p (such as pixels in an image) into a www.nature.com/scientificreports/ smaller parameter space. Specifically, the smaller parameter space is the key factor containing the features of the image, which is the hydraulic conductivity distribution in this study, and used to be called the latent variables z . In reverse, the decoder maps the latent variables back to their original high-dimensional space. In the bimodal case, the reference fields have a resolution of 100 × 100 pixels, which is a high dimensional space. The training of VAE goes through these two encoder and decoder steps. The dimensions of the latent variable, z , are 50, 100, and 200, and 100 epochs are performed during the VAE training for characterizing the relationship between the latent variables and the property distribution. For more details on how the VAE was trained, refer to 48 . After the training of the VAE, z is applied to represent the physics meaning of the hydraulic conductivity distribution feature. At the same time, the decoder step accounts for the physics-embedded generative model. Here, the physics is embedded through the process of training the VAE on images that contain the physical understanding of the subsurface-in this case, the two facies. In this study, three z values, 50, 100, and 200, are tested for the inverse analysis performance in terms of key factors. Like the Gaussian case, for bimodal fields, a constant head drop of 1 m from the left boundary to the right has been set up for the fluid flow system. The head distribution is calculated through the forward physical model (Darcy's law) of groundwater dynamics at a static state and considering the mass conservation. Oppositely, a coarse 5 × 5 regular grid within the domain is demonstrated for observation during inverse analysis calibration, shown in Fig. 4a, since VAE is a powerful tool to capture the property distribution with less input information. The regularization term of optimization for bimodal fields takes latent variables covariance into account, and the objective function is defined as: where z is the covariance matrix for the latent variables, and z is the mean of the latent variables. A more detailed description of inverse analysis for Gaussian and bimodal fields is specified in 48 .
More heterogeneous, bimodal hydraulic conductivity fields are simulated through VAE as the physics-embedded generative model, which is good at spacial feature characterization, to test our idea of the physics-embedded inverse analysis; the results are shown in Fig. 4. For bimodal fields in Fig. 4, the broad similarity in each facies between the reference fields and simulated results implies that the inverse analysis approach captures the salient aspects of the hydraulic conductivity distribution features. Even if the relative error is comparably higher than those from Gaussian fields, the phenomena lie in the higher complexity of bimodal fields. Meanwhile, for the difference in Fig. 4c, f, and i, the major error only occurs at the edge of the two facies; oppositely, in each face, the difference is small and close to zero. In conclusion, including the physics-embedded generative model during inverse analysis, even for the ML-specific approach, turns out to be a good application for different types of heterogeneous hydraulic conductivity fields based on having consistently good results. However, even if it approves the application of the physics-embedded inverse analysis, specifically for more complicated fields, like the edges in bimodal fields, it needs more calibration or on-site investigation for future field applications. On the other hand, it also implies that more detailed physics understanding or background should be included when dealing with complicated field situations.
Hydraulic fracture network. Most drilled wells have been fractured in the oil and gas production field, resulting from fluid pressure differences 11,18,19 . Fully understanding the distribution and properties of the hydraulic fractures is of essential importance for production estimation and reservoir protection. In the cases of drilled wells, they are now turned fully horizontally into the target geologic formations. At the same time, for almost all depths of interest, the hydraulic fracture will be normal to the direction of the horizontal well. In this study, a cluster with five hydraulic fractures has been selected to present the process of inverse analysis during gas production 60 . A medium-scale matrix of size 100 m × 100 m, with a 78 m drilled well in the center position, was used to represent the research domain. Hydraulic fractures are in the normal direction to the drilled well and distributed in a constant interval between them. The length of the hydraulic fractures follows a power law, given by 61 where r is the length of hydraulic fracture, p is the power, R 1 and R 0 are the maximum and minimum of the hydraulic fracture length range, and f(r) is the possibility of a certain length r. Based on a literature review of fracture length distributions 62 , the power p is set up to be 1.8, while the length range of fractures spans from 10 m to 90 m in this problem. After the fracture length has been determined, a size-transmissivity relationship, which describes the transmissivity of fractures and shows a positively correlated power law with the length of fractures, is introduced. The size-transmissivity relationship is defined in 63,64 as www.nature.com/scientificreports/ where T is the fracture transmissivity, and α and β are related parameters with values 1.3 * 10 −9 , and 0.5, respectively. In this study, the reservoir has a 10 m thickness. In addition, the permeability of the matrix and the drilled (7) log(T) = log(α * r β ) ,  www.nature.com/scientificreports/ In this study, gas production has been simulated. Especially the gas pumping position is set to be at the right end of the drilled well with a rate of 0.82 m 3 /s based on data from the Marcellus Shale Energy and Environment Laboratory (MSEEL) 65 . The shale layer of gas production is located at a depth of 2300 m, and the temperature and pressure of the subsurface are 75 • C and 15 MPa, respectively. A transient flow model based on mass conversation and Darcy's law is built for the pressure drop calculation, assuming single-phase gas flow based on the experience at MSEEL, which is very dry. Two observation points are located at the two ends of the drilled well, as shown in Fig. 5. As the pumping goes on, the observation lasts for two weeks, with a 30-min frequency of data collection. The objective function for this problem utilizes the same equation as Eq. (4) in "Principal component geostatistical approach" section.
Similar results in the Gaussian fields for the hydraulic fracture problem are shown in Fig. 5. Surprisingly, the high similarity between the reference fields and the simulated results illustrates the characterizing ability of inverse analysis when including the physics understanding of the relationship between fracture distribution and permeability. In addition, the extremely low relative error additionally supports the conclusion. The error in the permeability of the hydraulic fractures mainly exists at the two ends of the fractures and is small. The more we understand the physics background in the hydraulic fracture problem, the higher possibility we can predict the underground fluid flow system and make a more reliable estimation of oil and gas production.

Seismic inversion.
Seismic inversion estimates subsurface properties by matching predicted data generated on a proposed model to observed data collected at receiver locations. This study only tests the inverse analysis from the seismic records h observed at the surface to elastic properties p , which is the velocity at which the P-wave passes through subsurface layers. The research area is an underground reservoir of size 2 km × 1 km, consisting of four horizontal layers. In addition, our physics-embedded generative model embeds the domain knowledge that the velocity tends to increase with depth. Four key factors (latent variables) are developed to www.nature.com/scientificreports/ compute the increasing velocity trend with depth. The reference fields of the geological layer properties ( p with size 2000 × 1000) are shown in Fig. 6a, d, and g. To generate the observed data, a seismic wave has to be triggered; in this application, the location of the source point is fixed at the center of the domain on the surface. Since we constrain our models to only vary with depth and not horizontally, we only use one source per model in our experiments. In this study, the wave has been computed using a finite difference model. In addition, 100 seismic receivers are located symmetrically beside the source position along the surface, as shown in Fig. 6. Data sets are recorded at the receivers for a record length of 0.8 s. The data from the 100 receivers for the whole simulation time has been implemented for calibration during inverse analysis. The objective function for this problem utilizes the same equation as Eq. (4) in "Principal component geostatistical approach".
Finally, we also investigate seismic inversion by applying the physics-embedded inverse analysis. Not surprisingly, the simulated data from the estimated model closely matches the reference field data, indicating a good match between the reference and estimated models. The maximum error is around 11%, which is acceptable for the deep layers. It is more convincing when considering the difference subfigures; the error is only notable for the deepest layer, while all the shallow layers illustrate outstanding results. Hence, we can conclude that the generative approach of physics-embedded inverse analysis is successful for the seismic inversion problem. Meanwhile, one conclusion from these results is that shallow layers have higher reliability during the inverse analysis, and more investigation or calibration is needed when facing deep layers.
Comparison of observational results. Beyond only the comparison between the "true" reference fields and the simulated inverse results, the comparison between the observational data and outputs of the forward model is represented in Figs. 3, S5-S7 to further demonstrate the performance of the physics-embedded inverse analysis. After reviewing all the observation data comparison figures, an inevitable conclusion is that the inverse analysis successfully captures all the observational information features. Therefore, it further approves the application of the physics-embedded inverse analysis for underground reservoir property characterization based on observational information on the earth's surface. Figs. 3 and S1-S4.

Convergence. The convergence results are depicted in the supplementary information in
The convergence is generally obtained from 20 to 120 iterations for all the problems. Generally, the higher accuracy (hydraulic fracture problem) needs more iterations to reach good results; however, this relationship also is affected by the complication of the reference fields and the quality of the physics-embedded generative model. More discussion about the computational time and cost follows in the next section.

Discussion
Inverse analysis is fundamental to help us find the underground structure and the geological properties distribution. However, since the underground situation is complicated and we only have some observational data at specific locations, the inverse analysis is sometimes hard to conduct, and the results accuracy is unreliable sometimes. It is essential to improve the accuracy and reliability of inverse analysis in geosciences. Since our model incorporates the physics of the underlying problems, it reaches accurate final results and decreased model time. This demonstrates the advantages in considering the physics background which is the importance of our study 40,41 . The physics-embedded inverse analysis provides an approach including the physics background understanding to perform inverse analysis effectively. At the same time, the application of algorithmic differentiation shows fast and efficient gradient calculation during optimization, as is discussed below. Our goal here is to demonstrate an inverse analysis approach that uses a physics-embedded generative model by showing how it generalizes some existing methods and can be used more broadly in both subsurface flow problems and seismic inverse problems. Three inverse problems are completed to investigate the accuracy of the proposed approach; mainly, for the heterogeneous hydraulic conductivity fields, we discussed the two types of heterogeneity and the various physics-embedded generative models during the inverse analysis. Generally, the comparison results in Fig. 2, 4, 5 and 6, which show the high similarity between the reference fields and the simulated results, support the discovery of the underground properties using these inverse methods. However, for Gaussian hydraulic conductivity problem, which is based on the statistical characterization of the system as the physics understanding, brings some concerns about accuracy only at the edges of the research area. At the same time, for the more heterogeneous problem, the bimodal fields, including the VAE method, which can thoroughly characterize the property distribution in each face from image data, illustrate its strong capability for the complicated scenario. However, the VAE leading inverse analysis struggles at the boundary of the two facies. Therefore, it shows that more complicated problems need a complete understanding of the physics background to reach the perfect performance of the inverse analysis.
Specifically, the hydraulic fracture problem provides the best inversion results of all the research problems. Understanding the physics mechanism of the fractures generation and distribution convinced us of its immense potential for highly successful inversion of the hydraulic fracture network, which provides the following estimation or protection plans for oil and gas production. However, in this study, the two ends of the fracture also draw attention to more calibration. It indicates the difficulty of inversion when considering the connection of the hydraulic fracture network with the existing natural fracture system. In addition, we only pick one cluster of hydraulic fractures to conduct the inverse analysis; in an actual situation, the production well is several km long, where there are many hydraulic fracture clusters along with it. Similarly, in the scenario about seismic inversion, the low relative error and high similarity illustrate the success of the physics-embedded inverse analysis approach. The error mainly focuses on the deep layers, even if the error is relatively small, which indicates that more consideration may be needed for deeper layers. Again, there are large fractures, caves, and mines in the subsurface www.nature.com/scientificreports/ environment, which creates the discontinuity of the properties of interest and brings difficulty to the inversion. However, our generative physics-embedded inverse analysis provides an approach to easily and rapidly conduct underground property characterization. Our approach generalizes several different inverse analysis approaches (e.g., PCGA and RegAE), and the accuracy of the final results depends on the choice of the generative model. Even if we need to discuss the more complicated problems in the future, the solution would be to improve the physical background understanding, which leads to applying an appropriate generative model to the related simulations. As a result, we only provide some fundamental insight into how to invert the underground geological structures by applying the inverse analysis method. Our analysis was performed on a machine with an Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz with 32 threads and an NVIDIA RTX 2080 Ti GPU only for the VAE training. Except for the VAE training for bimodal fields, all the other problems only need to prepare the three reference fields through the physics-embedded generative model, which does not require much generation time. However, the optimization is most time-consuming, and the time to perform the inverse analysis varies somewhat depending on the reference fields. For example, for the Gaussian hydraulic conductivity problem, each epoch needs around 10 s to finish, while for the hydraulic fracture problem, the average time for each epoch is 10 m. As a result, the inverse analysis process time for all the problems mentioned in this study spans from 5 minutes to 1 day. Furthermore, the gradient calculation dominates the total computational cost of the inverse analysis. However, algorithmic differentiation efficiently improves the computation rate for the gradient calculation. Even if the reduction to z parameters from p interest properties makes the inverse analysis easy and fast, the application of algorithmic differentiation shows its extra benefit. The cost of computing a gradient with finite difference methods is ∼200 model runs (proportional to the number of components of z ), while the cost of computing a gradient including algorithmic differentiation is ∼ 2 model runs on average. Therefore, applying algorithmic differentiation helps speed up these computations by an additional factor of up to ∼100. In our research, there is only one objective function without any optimization constraints, which makes the optimization easier to achieve. In addition, the application of regularization allows for easy optimization. Combining these two features illustrates their vast potential for computational cost-saving for easy and efficient inverse analysis.
This study proposes the generative approach to include the physics-embedded generative model during inverse analysis. The framework can efficiently characterize various underground properties inversion and demonstrate accurate and trustworthy prediction results. Understanding the subsurface geological structure and properties helps in groundwater management and protection, oil and gas production estimation and optimization, and heterogeneous underground structure detection. Our approach provides new avenues of support for achieving good performance for inverse analysis by including the physics-embedded generative model. In addition, with algorithmic differentiation, the optimization can be completed fast and efficiently. Finally, we will explore more complicated and realistic geologic research problems by applying our proposed approach to expand its application in geologic properties inversion.

Conclusion
We have presented the application of an inverse analysis approach with a physics-embedded generative model for underground geological properties characterization, which provides an efficient method of regularization and algorithmic differentiation. In this study, a novel method for inverse analysis is proposed, which generalizes different inverse analysis approaches, and we have tested the application of this approach for various problems. We used four physics-embedded generative models: one based on the principal components arising from the geostatistical structure of the parameter fields, another using a variational autoencoder that was trained on images of the parameter maps, a third that embeds the structure of a hydraulic fracturing well (including a relationship between fracture length and permeability), and a fourth that includes geologic layers with distinct P-wave velocities. As a result, the physics-embedded inverse analysis provides accurate and consistent performance for various inverse problems. Using the physics-embedded generative model in combination with observational data enables to construction of a loss function that can be automatically differentiated. Our approach is computationally efficient and obtains an excellent solution to the inverse problem by easing the regularization process and applying algorithmic differentiation. In the future, different observational strategies need to be discussed to enhance the accuracy for more significantly complicated problems and deliver a high level of reliable inverse results based on an efficient observational plan.

Data availability
A computer program automatically generated all the data used in this manuscript. The code for generating the data, training the data, and performing the inverse analysis is available at https:// github. com/ Orcha rdLANL/ Regul ariza tion-DP-paper. PCGA-ex.jl, RegAE-ex.jl, Fract-ex.jl, and Wave-ex.jl are the running files for the problems presented in "Principal component geostatistical approach"-"Seismic inversion" sections. Especially for the bimodal hydraulic conductivity problem, ex bimodal.jl generates the training data set for VAE training. So these files, ex_bimodal.jl and RegAE-ex.jl need to run in order. In addition, if first time running, the related packages are available at https:// github. com/ Orcha rdLANL.